Skip to content

[VectorDistribute] Allow duplication of operation chains#23937

Open
sommerlukas wants to merge 3 commits intoiree-org:mainfrom
sommerlukas:layout-analysis-duplicate-chains
Open

[VectorDistribute] Allow duplication of operation chains#23937
sommerlukas wants to merge 3 commits intoiree-org:mainfrom
sommerlukas:layout-analysis-duplicate-chains

Conversation

@sommerlukas
Copy link
Copy Markdown
Contributor

If a value gets assigned two different layouts, the VectorLayoutAnalysis tries to duplicate that value if it is easily duplicatable to avoid layout transformation via LDS.

So far, this duplication was limited to single operations. This PR extends this to bounded chains of easy to duplicate/recompute values.

The motivation for this change is masks. CSE will deduplicate create_mask operations. After early materialization of create_mask operations, the mask is transformed to a chain of operations:

%step = vector.step : vector<64xindex>
%limit = vector.broadcast %n : index to vector<64xindex>
%mask_1d = arith.cmpi slt, %step, %limit : vector<64xindex>
%mask = vector.broadcast %mask_1d : vector<64xi1> to vector<16x64xi1>

As the mask is used for different operations, it will receive multiple different layouts. Therefore, we would materialize these masks to LDS for layout transformation.

By allowing chains of cheap operations to be duplicated, we avoid that materialization in LDS.

This is part of #23415.

Assisted-by: Claude Code

Signed-off-by: Lukas Sommer <lukas.sommer@amd.com>
Signed-off-by: Lukas Sommer <lukas.sommer@amd.com>
Copy link
Copy Markdown
Contributor

@Groverkss Groverkss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, feel free to land after addressing comments


/// Maximum length of chains of cheap-to-compute operations that get duplicated
/// for layout conflict resolution.
static constexpr unsigned kMaxChainLength = 8;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason for unsigned? Can we just use int? (No need to reply, just keep it if you have a reason)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only place we use it is for a comparison with size(), so I now decided to make it size_t to match the return type.

Comment on lines +484 to +498
static bool isCheapToClone(Operation *op) {
if (isDuplicatableLeaf(op)) {
return true;
}
return isPure(op) &&
(isa<vector::BroadcastOp, vector::TransposeOp, vector::ShapeCastOp>(
op) ||
OpTrait::hasElementwiseMappableTraits(op));
}

/// Collect a chain of ops that can be cloned together. Starting from `op`,
/// walk backward through single-result, cheap-to-clone ops until we reach
/// duplicatable leaves, constants, or non-vector operands. Returns true if
/// the entire chain is safe to clone. Shared intermediates (with multiple
/// uses) are allowed because all ops in the chain are cheap to duplicate.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why single result is important here. We happen to have most things as single result today, but just curious why.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment was outdated, the code didn't check for a single result anymore, but I forgot to update the comment.

Comment on lines +524 to +527
// Non-vector operands (scalars, indices) don't need cloning.
if (!isa<VectorType>(operand.getType())) {
continue;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could probably say this for any vector with 1 element as well.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, I've added an early continue for single-element vectors.

Comment on lines +581 to +583
for (Operation *op : llvm::reverse(chain)) {
fixupOp(mapping.lookup(op->getResult(0)).getDefiningOp());
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a comment on why this will not recursively loop?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

Signed-off-by: Lukas Sommer <lukas.sommer@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants